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ON ASYMPTOTICALLY OPTIMAL TESTS UNDER LOSS OF 
IDENTIFIABILITY IN SEMIPARAMETRIC MODELS 

By Rui Song 1 , Michael R. Kosorok 1 and Jason P. Fine 2 

University of North Carolina 

We consider tests of hypotheses when the parameters are not 
identifiable under the null in semiparametric models, where regularity 
conditions for profile likelihood theory fail. Exponential average tests 
based on integrated profile likelihood are constructed and shown to be 
asymptotically optimal under a weighted average power criterion with 
respect to a prior on the nonidentifiable aspect of the model. These 
results extend existing results for parametric models, which involve 
more restrictive assumptions on the form of the alternative than do 
our results. Moreover, the proposed tests accommodate models with 
infinite dimensional nuisance parameters which either may not be 
identifiable or may not be estimable at the usual parametric rate. 
Examples include tests of the presence of a change-point in the Cox 
model with current status data and tests of regression parameters in 
odds-rate models with right censored data. Optimal tests have not 
previously been studied for these scenarios. We study the asymptotic 
distribution of the proposed tests under the null, fixed contiguous 
alternatives and random contiguous alternatives. We also propose a 
weighted bootstrap procedure for computing the critical values of the 
test statistics. The optimal tests perform well in simulation studies, 
where they may exhibit improved power over alternative tests. 

1. Introduction. In this paper we investigate nonstandard testing prob- 
lems involving a family of probability distributions {P$,0 E 0}, known up 
to a parameter 0, in a parameter space 0. The parameter space is as- 
sumed to be a subset of an infinite-dimensional metric space. The null and 
alternative hypotheses are: 

H :9€Qo vs. Fi:0e0\0 o , 
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where Go is a subset of Q and contains at least two elements. In the usual 
testing framework, the parameters are unique under the null so that iden- 
tifiability is not an issue. While we allow multiple values of 9 satisfying 
the null, we assume that the null distribution, denoted by P$, is unique, 
where 0o = {9 £ G : Pg = Po}. Under this setup, the true value of 9 is not 
identifiable under the null, since for any 9^9' in Go, Pq = P$> = Po- Such 
loss of identifiability occurs in diverse applications in the social, biological, 
physical and medical sciences. We next present two such examples followed 
by a description of the main contributions of this paper. The Introduction 
concludes with a brief outline of the remainder of the paper. 

1.1. Example 1; Univariate frailty regression under right censoring. Let 
T be a nonnegative random variable representing the failure time, C be the 
independent censoring time, V = min(T, C) and Z = Z(-) be a corresponding 
p-dimensional covariate process. The observed data {Xi = (Vi, Aj, Zi),i = 
1, . . . , n} consists of n i.i.d. realizations of X = (V, A, Z) , where A = 1{T < 
V}, 1{-} is the indicator function. In this model, the hazard function of the 
survival time T given covariates Z is 

(1) \{t; Z(t), W} = V (t)Wexp{p T Z(t)}, 

where t is the time index, W is an unobserved gamma frailty with mean 
1 and variance £, (3 is a p-dimensional regression parameter and rj(-) is a 
completely unspecified baseline hazard function. 

When (3 is not zero, the odds-rate model has been treated extensively; see 
Kosorok, Lee and Fine (2004), Murphy, Rossini and van der Vaart (1997); 
Murphy and van der Vaart (1997, 2000); Parner (1998); Slud and Vonta 
(2004), among others. Scharfstein, Tsiatis and Gilbert (1998) considered semi- 
parametric efficient estimation in the setting, where the covariates are time 
independent, £ is assumed known and rj(-) is assumed to be absolutely con- 
tinuous. Bagdonavicius and Nikulin (1999) considered estimation for a class 
of proportional hazards model, which includes the odds-rate model with £ 
unspecified, based on a modified partial likelihood. Kosorok, Lee and Fine 
(2004) considered robust inference for odds-rate models when the frailty dis- 
tribution and regression covariates may be misspecified. To our knowledge, 
problems associated with testing the null (3 = when the frailty parameter 
is unknown have not been previously considered in the statistical literature. 

It has been shown that £ and rj(-) are not identifiable under the null 
[Kosorok, Lee and Fine (2004)]. Intuitively, when (3 = 0, the covariate pro- 
cess Z provides no information for the failure time process. The frailty W 
and the baseline hazard n(-) are not distinguishable from each other, hence 
£ and rj(-) are not identifiable. Thus, the testing problem described above is 
nonregular and standard asymptotic results are not applicable. 
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1.2. Example 2: Change-point regression for current status data. Change- 
point models have been studied extensively and have proven to be popular in 
clinical research. In many settings, a change-point effect is realistic and can 
be much easier to interpret than a quadratic or more complex nonlinear effect 
[Chappell (1989)]. Change-point Cox models have been widely used in sur- 
vival applications, as in Kosorok and Song (2007); Luo, Turnbull and Clark 
(1997); Pons (2003), where likelihood ratio tests were investigated. However, 
to our knowledge, optimal testing has not been explored for such models. 

Under current status censoring, a subject is examined once at a random 
observation time V and at that time it is observed whether the event time 
T < V or not. The observed data {Xi = (Vi, Aj, Zi),i = l,...,re} consists 
of n i.i.d. realizations of X = (V,A,Z), where A = 1{T < V} and Z is a 
c?-dimensional covariate. Here we let d = 1 for simplicity. In this example, 
we assume that the time to event T satisfies a change-point Cox model 
conditionally on the covariate Z. That is, the density of X is given by: 

(2) P e(x) = (1 - ^jA^Mai.))!^^ 

with r 7 (z) = az + {(5\ + f32z)l{z > £}, where a, Pi and are scalar re- 
gression parameters, £ is the change-point parameter and A(-) is the cu- 
mulative baseline hazard function. We also define the collected parameters 
(3 = ((3i, fy), £ = (/?, a), 7 = (£, and n = (a, A). We are particularly inter- 
ested in the hypothesis test of the existence of a change-point for regression 
parameters in the above model, that is, Ho : (3 = 0. 

Although Cox regression with current status data was discussed by Huang 
(1996) and others, change-point Cox regression has not been studied with 
current status data. The development of optimal tests in the current status 
setting is further complicated by the fact that the nuisance parameter A 
cannot be estimated at the parametric rate, unlike with right censored data. 

In model (2), the change-point parameter is present only under the alter- 
native. This is different from Example 1, where the odds rate parameter £ 
and the baseline hazard function r](-) are both present, but indistinguishable, 
under the null. 

1.3. Description of main contribution. The statistical literature contains 
numerous precedents on the nonidentifiability problem in parametric models, 
see Chernoff (1954), Chernoff and Lander (1995), Dacunha-Castelle and Gassiat 
(1999) and Liu and Shao (2003). Among others, Dacunha-Castelle and Gassiat 
(1999) proposed a locally conic parametrization approach to enable asymp- 
totic expansions of the likelihood ratio test under loss of identifiability under 
the null. Liu and Shao (2003) derived a quadratic approximation of the log- 
likelihood ratio function by using Hellinger distance. Most authors directly 
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study the approximation of the log-likelihood ratio function in some neigh- 
borhood and obtain its asymptotic null distribution. However, the asymp- 
totic optimality properties of the classical likelihood ratio tests (LRT) do 
not hold anymore [Lindsay (1995)] and Wald and score tests are not even 
well defined in these nonstandard problems. To our knowledge, all results 
for testing nonidentifiable Pq using likelihood based tests are for parametric 
models. The main aim of this paper is to investigate the construction of 
optimal likelihood based tests for semiparametric models. 

A key question which arises, as noted by Dacunha-Castelle and Gassiat 
(1999), is: since the parameter is not identifiable, around which point can 
an expansion be made? To address this question, we assume the existence 
of a "full rank" reparameterization which contains all the information of the 
null model and in which all parameters are identifiable. To be specific, we 
partition = (ifj, Q and ip = (f3, r/), where (3 E MP is a parameter of interest, 
£ € M. q and rj is a parameter defined on an arbitrary parametric space, TC^. 
We assume that the information in the null model can be absorbed into 
the parameter space of rj, through this full rank reparameterization. This is 
made precise in Section 2. Note that Example 1 requires such a reparame- 
terization since both £ and r\ are present under the null. In contrast, such 
a reparameterization is not required for Example 2 since £ is not present 
under the null. 

When the models involved are parametric, a special case when r) does not 
depend on £ under the null, that is, £ is only present under the alterna- 
tive, has been studied extensively by Andrews and Ploberger (1994); Davies 
(1977, 1987); King and Shively (1993), and others. Davies (1977) showed 
that the likelihood ratio test is optimal in the sense that as the significance 
level of the test tends to zero, its power function approaches that of the 
optimum test when £ is given. These optimality results are very weak and 
do not provide any guidance regarding the performance of the test in prac- 
tical applications, where the significance level is fixed, for example, at level 
0.05 [Andrews (1999)]. Andrews and Ploberger (1994) studied optimal tests 
for parametric models using the weighted average power criterion originally 
introduced by Wald (1943) when studying the likelihood ratio test under 
regularity conditions, where the model is identifiable under the null. Un- 
der loss of identifiability, the likelihood ratio test is generally less powerful 
than the optimal test in Andrews and Ploberger (1994). These optimal tests 
possess a Bayesian interpretation, where the weight corresponds to a prior 
on the nonidentifiable parameter, and are asymptotically equivalent to a 
Bayesian posterior odds ratio. 

In this paper, we adapt the weighted average power criterion [Andrews 
and Ploberger (1994), Wald (1943)] to construct optimal tests in semipara- 
metric models under loss of identifiability. Our main contribution is to extend 
the results of Andrews and Ploberger (1994) in at least four directions. 
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First, Andrews and Ploberger (1994) address only parametric models, as 
is the case for most of the literature on testing problems with nonidenti- 
fiability under the null. Our optimality results are available for semipara- 
metric models, where rj may be infinite dimensional and £ may not be es- 
timable at the usual parametric rate under either the null or the alterna- 
tive. A semiparametric profile likelihood approach is adopted to reduce the 
infinite-dimensional model to a finite-dimensional uniformly least-favorable 
submodel; see Murphy and van der Vaart (2000) for a discussion of profile 
likelihood in regular settings. We note however, that the idea of uniformly 
least favorable submodels is a new concept in semiparametric settings, which 
is not discussed in Murphy and van der Vaart (2000). The development of 
this concept is both nontrivial and critical to establishing an appropriate 
optimality criterion for semiparametric models under loss of identifiability. 

Second, the results of Andrews and Ploberger (1994) are applicable only 
for tests where a nuisance parameter (namely £) is present only under the 
alternative. This may not be true in our situation, where a nondegener- 
ate reparameterization may be needed to make £ vanish under the null. 
Furthermore, our tests and the optimality results do not depend on the 
reparameterization. 

Third, Andrews and Ploberger (1994) establish that their test is optimal 
with respect to local alternatives for ?/> involving a multivariate normal prior 
with singular covariance matrix. In our approach, it is only necessary to 
specify the prior in the direction of j3, the parameter of interest, and no 
prior is needed on the remaining parameter 77. This enables us to avoid the 
singular covariance issue in Andrews and Ploberger (1994). 

Fourth, we develop a simple and effective Monte Carlo method of inference 
for the proposed test statistics. 

Adopting a profile likelihood approach has several advantages. First, un- 
der the identifiable submodel, the MLE for rj may converge at a slower rate 
than the usual \fn rate, such as the change-point Cox model with current 
status data. This makes the theoretical justification based on Taylor expan- 
sion of the full likelihood fail. Second, even if the MLE of the nonparametric 
component converges at the y/n rate, semiparametric likelihoods may not 
be suitably "differ entiable," in particular, when such a likelihood contains 
certain empirical terms, as with, for example, the odds-rate model. Third, 
handling the remainder terms in a Taylor type expansion is challenging, ow- 
ing to the presence of the infinite dimensional parameters, and a delicate 
Banach space analysis is required. Employing the profile likelihood enables 
us to address these issues rigorously. 

1.4. Organization of paper. The remainder of the paper is organized as 
follows. In Section 2, we present the generic testing problem and the model 
and data assumptions. The optimality results are given in Section 3. We 
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verify that the results hold for the examples in Section 4. In Section 5, we 
describe a simulation study to evaluate the finite-sample behavior of the 
proposed tests and to compare its efficiency with some alternative tests 
for the current status example. In Section 6, we discuss some additional 
examples without identifiability under the null which are not covered in our 
current settings and which require further extensions. Proofs are given in 
Section 7. 

2. The hypothesis tests and assumptions. 

2.1. The optimal tests. In this subsection we formulate the tests of hy- 
potheses when the parameters are not identifiable under the null. Let Pq de- 
note the probability measure, based on observed data X n = (X±,X2, ■ ■ ■ , X n ), 
where fl£6 and the subscript n is the sample size. As mentioned previ- 
ously, the parameters 6 S Go under the null hypothesis are not identifiable. 
We assume, as in the examples, that 6 can be partitioned as (V^C); with C 
(/-dimensional and ip of arbitrary dimension. We further assume that t/> can 
be partitioned as (/3,rj) so that the null hypothesis can be stated in terms 
of /3, with the nuisance parameter rj having arbitrary dimension. The likeli- 
hood function of the data is given by l n {9) and the profile likelihood for (3 
and ( is defined as pl n (P, C) = sup^ l n (P, r/, C)- For the semiparametric model 
{P(f3,r],Q } on a sam pl e space X, we assume (3 E W, (GH, a compact subset 
of M q and rj £ Ti^, which is a subset of a Banach space. 

The hypotheses to be tested are: 

(3) H :f3 = p vs. R^P + Pq. 

When P = Pq, the null distribution Pq is unique and the likelihood for a 
single observation under the null is abbreviated as 1°. Let tt = (77, C)- The 
null set of 7r is Ho and its cardinality is the same as that of H, which is at 
least two. 9o = {Pq} x IIo- For each Q £ H, 770(C) = {t £ T~i ri : (t, C) G IIo} is an 
interior point of 7i v . Let Vo(C) = (Po,Vo(0)> and #o(C) = ("00(C): 0- Thus, 
Go can be represented as Go = {#o(C) : C £ 

Before introducing the optimal tests, we need some additional notations 
for the parameter space and the score and information operators in the 
semiparametric settings. We denote In £ L^Pq) as the derivative of logZi(#) 
with respect to P and lp is the second derivative of \ogl\{ff) with respect 
to p. L^Pg) refers to the class of square integrable functions under the 
measure Pq with mean 0. The score operator for 77 is defined as 1^, which is 
a bounded linear map from Ti^ to L^Pe) with adjoint operator It : L^{Pq) ^ 
TC^, where TC^ is the closed linear span of Ti^. The information operator is 
Itlf) : L^iPe) l— ► L^Pe). The efficient score for P is the ordinary score function 
In minus its orthogonal projection onto the closed linear span of the score 
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operator 1^. The efficient information for (3 is Ip = flpl'gdPg, which is the 
asymptotic variance of the efficient score function. 

We use the notations P n and G n for the empirical distribution and the 
empirical process of the observations. That is, for every measurable function 
/ and probability measure P, 

-i n . i n 

Pn/ = -E/TO» P f= fd p ' G n f=-=^2(f{Xi)-P{f)). 
i=i v i=i 

We note that although simultaneous estimation of (3 and C fails under 
the null due to nonidentifiability, estimation results for (3 n (C), the MLE of 
(3 at a fixed value of £, are often valid under the null. This suggests making 
inference about (3 using (3 n (C)- For nx ed C €= ^, the score, Wald and likelihood 
ratio test statistics for testing Hq against Hi are given by 

Rn(0 = ff ) n/ / 3(^o(C)) / {lPn/' / 3^(eo(C))}' 1 ff ) n/'/3(^o(C)), 
W n (Q = (Pn(C) ~ Po)%0n(O)(0n(Q ~ A>) and 

LR n (() = -2{l n (e (O)-l n (6 n (())}, 

where 9 n (Q = (/3n(C)>%(C)> C) is the unrestricted MLE of 9 at a fixed value 
of C and #o(C) = (A)> %(C)j C) is the restricted MLE of 6 for a fixed value of £ 
under the null. F n lp(0o(C)) = ^nlpiPo, Vo(C)i C) is the empirical score function 
of (3 evaluated at the restricted MLE O (C)- Wjp(0 n (C)) = ^^(^(0,^(0,0 
is the empirical score function of (3 evaluated at the unrestricted MLE 6 n (C)- 
The inverse matrix of I n(9 n (C)), a consistent estimator of the efficient infor- 
mation In under the null, estimates the covariance matrix of (3 n (C)- 
The optimal tests we propose take the form 

ER n = (1 + c )-p/2| exp (l_^_ Rn{c ^ dJ(C)) 
EW n = (l + c)^ 2 J expQ-^WnCoWc) and 

ELR n = (l + cy^ 2 J expQ^— £#40) dJ(C), 

where c > is a known constant and J(-) is a pre-selected integrable prior 
on £. Their optimality will be discussed in Section 3. We note that, in semi- 
parametric settings, the computation of the efficient information may in- 
volve high dimensional maximization and nonparametric smoothing. Then 
the tests ER n and EW n may be computationally harder than ELR n . Hence 
the likelihood ratio based test ELR n is more attractive in these settings. 

In construction of the optimal tests, understanding and computing 60(C), 
may be complicated due to the dependence of the parameter 6q(C) on C- 
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Assuming the existence of the following full rank reparameterization, we 
can eliminate the dependence between rj and thereby easing both the 
theoretical developments and the computations for the proposed tests. 

2.2. Full rank reparameterization: Breaking the dependence between 77 and 
C . We assume there exists a map <f>^ : 7i v t— > 7i v , which is one-to-one and 
uniformly Hadamard-differentiable at 77 tangentially to Ti^ over Q £ S, that 
is, 

4>dv + t n hn(()) -(t>dv) 



sup 

0H-inMC),C)6n 



0. 



as sup£ gS ||^n(C) ~~ h(()\\ — > 0, and t n — ► 0, where h{Q is in the tangent space 
of H r] for all C G H and || • || denotes the norm of Ti^. Its derivative ^ is one-to- 
one and continuously invertible uniformly over C £ E. That is, there exists 
a positive constant c such that ||0£ (771(C) — ^2(0) II > c l I (0 — ^(Oll f° r 
every 771(C) and 772(C) i n f° r all C S H. Let 77 = ^cl 7 ?)' an< ^ ^i(A)>?7\ C)( x ) — 
h(Po, (j)^ 1 ^), C){ x ) = l°( x )i where C vanishes under the null, for all x in X. 

This reparameterization does not change the likelihood, that is, the equal- 
ity l\ (/3, 77(C) ) i x ) = £1 {&■> C) ( x ) holds both under the null and the alterna- 
tive. Under the null, the likelihood h((3o, 770 (C) ? C) = ^i(A)> 77o> C) f° r a specific 
%, which does not depend on C, and C disappears in the null likelihood. We 
thus reduce the parameter dimension of the null space from Ilo to H. v . For 
Example 2 in the Introduction, (j) can be taken to be the identity and thus 
the reparameterization is not needed. In contrast, a reparameterization is 
needed for Example 1. We will give the details later in Section 4. 

The reason we assume the existence of such a full rank reparameteriza- 
tion is to eliminate the dependence between 77 and C- The issue is that the 
optimality results are with respect to a perturbation of the parameter 77, 
which is not well defined in the original space, due to the dependence be- 
tween parameters 77 and C- Subsequent assumptions are built on the new 
parameterization 6 = (/5, 77, C) - However, the results still hold for the origi- 
nal parameterization, since the efficient score and efficient information of 
are invariant under such reparameterization of 77, as given in the following 
lemma: 

Lemma 1. Under the full rank reparameterization, lp{0) = where 
£p{6) is the efficient score of (5 under the new reparameterization. The effi- 
cient information matrix is also invariant to these reparameterizations. 

Remark 1. The full rank reparameterization defined above may not 
be unique. We will show later in the proof of Theorem 2 that the optimal 
tests proposed in this paper are invariant to the choice of the full rank 
reparameterization. 
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Next we discuss how to construct the optimal tests with the new pa- 
rameterization, where £ vanishes, and ip does not depend on £ under the 
null. 

2.3. Constructing optimal tests under the full rank reparameterization. 
Though £ disappears in the likelihood under the null hypothesis, the score 

and information are still processes indexed by £. For fixed £ £ H, the score, 
Wald and likelihood ratio test statistics for testing Hq against H\ with the 
new parameterization can be represented as: 

mo = p«W 0s oWA@ , oy^jp&o, 0, 

W n (C) = 0n ~ Po)%@ n , O0n ~ A)) and 
LR n (() = -2{£ n ^ , C) " £n&n, Oh 

where ip n is the unrestricted MLE of and ip is the restricted MLE of tp Q . 
P n £p(Po,r] X) is the empirical score function of (5 evaluated at the restricted 

MLE ip . P n £p(f3 n ,rj n X) is the empirical score function of (3 evaluated at 

the unrestricted MLE ip n . The inverse matrix of Ip(ip n ,Q, a consistent 
estimator of the efficient information of (3 under the null, estimates the 
covariance matrix of j3 n . It is thus obvious that the optimal tests are invariant 
with respect to the choice of full rank reparameterizations. 

To further study the asymptotic distribution and the optimality of the 
proposed tests, we need the following assumptions, based on the full rank 
reparameterization. We note that except assumption C, all other assump- 
tions can also be stated with the original parameterization. 

2.4. The assumptions based on the reparameterization. To derive asymp- 
totically optimal tests of Ho, we consider local alternatives to Hq of the form 
£n(Po + h/y/n,rj,C) with £ and 7] unspecified. The optimality criterion will 
involve a weighted average power criterion, where the averaging is with re- 
spect to an integrable prior Q^(h) on the values of h in MP defining local 
alternatives and an integrable prior J(C) on £. Before formally stating the 
optimality criterion, we give assumptions on the data and the parameter 
spaces. The first two assumptions postulate the existence of the prior on 
local alternatives, Q((h). 

Al The efficient information function of (5 evaluated at (V'ojOi ^s(V'O'C)) is 
uniformly continuous in f3 and £ over Bq x S, where Bq is some neigh- 
borhood of (3q. Furthermore, Ip(ip , £) is uniformly positive definite over 
(SH, that is inf^ e = A m i n {2" ( g('0 o , £)} > 0, where A m i n (C) is the smallest 
eigenvalue of the matrix C. 
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A2 * s a normal measure with mean (3q and variance cZg 1 (ipo, C) for £ E E, 
where c> is a scalar constant. 

Assumptions Al and A2 are analogous to Assumptions 1(e), 1(f) and 4 of 
Andrews and Ploberger (1994), although there are fundamental differences. 
Andrews and Ploberger (1994) work directly by building on the full para- 
metric likelihood and their assumptions refer to the information matrix for 
all parameters. Furthermore, their optimality results are defined in terms 
of local alternatives for ip, where the prior is a multivariate normal with 
singular covariance matrix. Our assumptions Al and A2 are only for the pa- 
rameter of interest, /3, with no prior assumptions needed for r\ under either 
the null or the alternative. 

The next set of conditions assumes the existence of a uniformly least- 
favorable submodel. This submodel can be viewed as a "uniform" version of 
the least favorable submodel discussed in Murphy and van der Vaart (2000): 
the convergence rate of the nuisance parameter now is in the "uniform" 
sense, and the efficient score and the efficient information possess Donsker 
and Glivenko-Cantelli properties with "larger" index sets, respectively. When 
the set of C, is a singleton, this new submodel concept reduces to the ordi- 
nary least favorable submodel. The development of this concept is critical to 
establishing an appropriate optimality criterion for general semiparametric 
models under loss of identifiability. Here are the needed assumptions: 

Bl There exists a map t*—>f t from a fixed neighborhood of (3q into Ti. v , such 
that the map t\-*£(t,9) defined by £(t,9) = £\{t, ft,C) 1S twice contin- 
uously differ entiable. Let £(t,9) and £{t,9) denote the derivatives with 
respect to t. The submodel with parameters (t,ft,C) passes through r] 
at t = (3, that is, fp(P,r], £) = rj for all ( £ S. 

B2 The submodel is uniformly least-favorable at ?/> = (Po,r} ) and £ for 
estimating (5q in the sense that £(0o,tpoX) = ^/3(V'0'C)- As (t,/3,rj) — ► 
(A),A),»?o)> we assume that sup CeS \\£(t,ip,Q -^(^OjC)II =°f()( 1 ) and 
su P^es II^jV'jC) — £(PoiiPoX)\\ =°Po(l)- I n the sequel, we let of, denote 
a quantity going to zero in probability, under P, uniformly over the set 

B3 We assume that ip , the restricted MLE of tp under the null, satisfies 
V'o = V'o + °-Po(l)- The unrestricted MLE ip n (() = ^q + op 1 (1). More- 
over, let 7Jp(C) = argmax^ n (/3,r?,C), that is, p£ n (/3,() = tn(P,%(C), 0- 
Assume that for any random sequences (3 n — >p (5q, we have rjp (C) = 
Vo + °p (1) an d the following uniform "no-bias" condition holds: 



(4) 



P £(PoJn,% n ((),()=0%(\\Pn ~ Po\\ +™~ 1/2 ). 
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B4 There exist neighborhoods U of (3q and V of ip Q , such that the class 
of functions {£(t, ip, Q : t G U, tp G V, £ G H} is Po-Donsker with square 
integrable envelope function and the class of functions {£(4, ?/>, £) : i G 
U, ip G V, C G H} is Po-Glivenko-Cantelli and is bounded in Li(Po), where 
L\{Pq) refers to the class of integrable functions under Pg. 

Assumptions B1-B4 set the stage for the quadratic expansion of the pro- 
file likelihood and the derivation of the optimality properties of the pro- 
posed tests. Note that these assumptions can also be built on the original 
parameterization, but we use the new parameterization for ease of presen- 
tation. Since our formulation includes parametric models as special cases, 
the existence of a uniformly least-favorable submodel in our set-up covers 
all situations considered by Andrews and Ploberger (1994). 

Compared with Andrews and Ploberger (1994), we have a stronger form 
of the unbiasedness condition and stronger requirements on the consistency 
of the estimators for the expansion of the profile likelihood. This is partly due 
to the more general structure of the semiparametric model. As in assump- 
tion B3, we require that if f3 n is any sequence of estimators consistent for 
A)i ^l^(Pn) must be consistent for rj , the true value of the nuisance param- 
eter rj, uniformly over S. In Andrews and Ploberger (1994), consistency is 
only needed for the unconstrained MLE (assumption 2) and the constrained 
MLE under the null hypothesis (assumption 3). 

To evaluate the local asymptotic distribution of the proposed tests, we 
require differentiability in quadratic mean (DQM) of the parameters ip, as 
stated in the following assumption C, which is commonly used to evaluate 
the local power. It will be verified for the two examples presented in the 
introduction. Unlike assumptions B1-B4, the full rank reparameterization 
is indispensable in assumption C: 

C Differentiability in quadratic mean of the parameter ip. A perturbation 
of ip in its domain is ip t = tp + th + o(l), where h = (hp, hfj), hp G W 
and hjj G Tijj. The DQM condition for ip with respect to the collection 
of paths {ip t } is: 



as t — > 0, 



for all £ G H, where is a bounded linear operator defined on W x Tt TI 
and takes values in L^Pq). 

Differentiability in quadratic mean implies that the range of ^4^ is con- 
tained in L^Pq). Note that A^h = {d/dt)£i(ip t ,£)\t=o, following similar argu- 
ments as in Kosorok and Song (2007), where h = (hp,hrj)- We define ^ to be 
given by A^{hp, hfj) = £p(ip, ()hp + £^{'4', C)hrj, where tp and £rj are the score 
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operators for (3 and rj, respectively. Moreover, MP x TC^ is a Hilbert space with 
|| • || denoting its norm and (•, •) denoting its inner product. Since in paramet- 
ric settings, twice continuous differentiability implies DQM [Pollard (1995)], 
this assumption is weaker than Assumption 1(c) in Andrews and Ploberger 
(1994). 

3. Main results. This section includes several main results. The first one 
gives the asymptotic null distribution of the proposed tests. 

3.1. The distributions of the test statistics under the null. To establish 
the asymptotic null distribution of the test statistics, a key result about the 
uniform profile likelihood expansion is summarized in the following lemma. 

Lemma 2. Under assumptions A-C, for any random sequence (3 n — >p 
Po, 

logpl n n , C) = logplM, + n(fi n - PoYKhiW) 
(5) - \n0 n - (3o)'lf3(6o(0)(Pn - Po) 

+ 0%(^\\Pn-Po\\ + l) 2 . 

Lemma 2 enables us to establish the asymptotic equivalence of these test 
statistics and their asymptotic distributions: 

Theorem 1. Under assumptions A-C, ELR n = EW n + op (l) = ER n + 
o Po (l) ex(c), where 

e X (c) = (1 + cY^J expQ^GWC))^o(C))G(0o(C))) dJ(Q, 

and G(#o(C)) *s the limiting process of G n / / g(^o(C)); which is a mean zero 
Gaussian process with variance function o~ 2 (() = I^(#o(C)) indexed by £ and 
with covariance function <t 2 (£i,C2) = -fo{^ / 3(^o(Ci))^ / 3(^o(C2))'} 7 indexed by £i 
and C2, for (, £i and C2 G S. 

Remark 2. When J(-) does not correspond to a prior on £, correspond- 
ing rather to a weight function lacking a probabilistic interpretation, then 
the results in Theorem 1 will generally hold, although the test may no longer 
possess the optimality discussed in the sequel. Theorem 1 should also hold if 
Q((h) is not a prior distribution, corresponding rather to a weight function 
on local alternatives for (3. This robustness indicates that the tests are gener- 
ally valid under loss of identifiability, yielding a large class of test statistics, 
with the optimal test being a member of this class. 
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We note that Theorem 1 only holds for normal weight Q^, which corre- 
sponds to the uniform least favorable direction. As indicated in the proof of 
Theorem 1, the normal weight function Qf(-) is integrated out, hence does 
not appear in the test with the original form. Subsequently, the optimal 
tests depend on the weight function Q^(-) only through the scalar c. The 
larger c is, the more weight is given to alternatives for which (3 is large. For 
example, for a test of the change-point model, larger values of c correspond 
to greater weight being given to larger changes. In the special case where 
J(C) is a pointmass at a single value £o, the optimal test rejects if and only 
if LR((o) exceeds some constant (i.e., the optimal test equals the standard 
score test for fixed Co) an d the optimal test is independent of c. When J(C) 
is not a pointmass distribution, however, the optimal test ELR n depends 
on c. The larger c is, the more power is directed at alternatives for which (5 
is large. 

The limit as c — > of the 2(ELR n — l)/c statistic is equal to the "average 
score" statistic / LR n {Q dJ((), which is the limit of the ELR statistics that 
are designed for alternatives that are very close to the null hypothesis. At 
the other extreme, the limit as c— >oo is log/ exp(Li? n (£)/2) dJ{Q. Thus 
for testing against more distant alternatives, the optimal test statistic is still 
of an average exponential form. 

If the constant c/(l + c) which appears in the definition of ELR n is re- 
placed by a constant r > 0, then the limit as r — > oo of ELR n is the likelihood 
ratio test, equivalently, the "sup score" statistic studied in Kosorok and Song 
(2007). Hence, the sup score test is designed for distant alternatives, but is 
of a more extreme form than the optimal exponential test, since the latter 
requires r < 1. It can be easily shown as a corollary to Theorem 1 that the 
usual likelihood ratio, Wald and score tests have the following distribution: 

Corollary 1. Under the null hypotheses and assumptions A-C, 
sup c Li? n (C) = sup c W n (C) + op (l) = sup f R n (Q + Op (1) -> d x, with x = 
sup c GWC))^o(C))G(0 o (C)). 

3.2. Optimality of the proposed tests. The second main result of this pa- 
per is the optimality property of the proposed tests. Following assumptions 
in Section 2, we consider local alternatives /3 = 0o + hp/y/n + o(n _1//2 ) for 
hp 6 MP with prior distribution Q^Qig) on the local alternative direction hp 
and prior distribution J(C) on the nonidentifiable parameter £. The opti- 
mality result is as follows: 

Theorem 2. Under assumptions A-C, the test statistics in Theorem 1 
are asymptotically uniformly most powerful for testing Hq :j3 = /3o against 
the contiguous alternative 

j dP l +h/V -n + o { n-^ U d Qdh)dJ(C), 
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where h = {hp^h v {Q), h v {Q = q'^hp and where = — (£*£rj)~£*£p^X) * s the 
uniformly least-favorable direction indexed by £. Moreover, this optimality 
result is invariant under the choice of reparameterization. 

Theorem 2 also implies that the proposed tests have the greatest weighted 
average power asymptotically in the class of all tests of asymptotic signif- 
icance level a, against the alternative , . _ , That is, they 



maximize 



lim P{(j) n rejects|V + h/y/n + o(n 1/2 ),() dQ c {hp) dJ(() 



over all tests 4> n of asymptotic level a. 

Our optimality results are under alternatives fio + hp/ ^/n + o^' 1 / 2 ), with 
nonsingular normal weights on hp. Our weights on hp are precisely Andrews 
and Ploberger's [2] weights projected onto the parameter space that is of 
interest. Thus, our results and Andrews and Ploberger's are consistent. 

We now discuss the choice of the direction q^, the priors Q^(-) and </(•)• By 
the Neyman-Pearson lemma, for any appropriate prior distributions Q((-) 
and «/(•) and any known directions q^, a UMP test for testing Hq:P = 
Pq against the contiguous alternative J ' dP^ +h/ ^ i+o{n _ 1/2) ^dQc i {hp)dJ(C), 

where h = {hp, h v {C)), h v {C) = q'^hp is defined by 

if QLR n > k an , 
if QLR n = k ani 
if QLR n < k an , 

where k an > 0, A n G [0, 1] are constants such that the rejection probability 
is a under the null and 

nrp I Wo + h/Vn + o(n-V 2 ), C) dQ c (hp) dJ(() 
We have the following result: 

Corollary 2. Under assumptions A-C, the null hypothesis and the 
contiguous alternatives, 

QLR n = (1 + c)-Pl 2 J expQ-i-ZlkCC)) W(q c , () dJ(() + o p (l), 

where W(q^X) < 1 w defined in equation (17) in Section 7 below. When 
QC = Q(> W(q< ; ,() = l and QLR n = ELR n + o Po (l). 

As the alternatives we consider are contiguous to the null, in each direc- 
tion q^, which indexes QLR n , there exists a consistent estimator fj n (q^) of rj 
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by the convolution theorem, provided certain conditions hold. The optimal 
tests can thus be built on fj n (q^). 

In applications with composite hypotheses, where is unknown, there 
may not exist a direction which can maximize the power over all directions 
[Bickel, Ritov and Stoker (2006)]. In a regular testing problem, where all 
parameters are identifiable, it can be shown that the likelihood ratio test, 
which is built on the uniformly least-favorable direction, will maximize the 
minimum power of all directions of the alternatives, over all the test based 
directions. In our nonregular testing problem, the situation is further com- 
plicated, since the power depends on the covariance structure of G(#o(C))- 
It is not clear if the maximin property still holds in our problem. We note 
that, however, our tests can be interpreted as the "maximum direction" 
test. Moreover, since the power of the test is not affected by multiplying 
by a constant in QLR n , we can standardize W(q^, £) dJ(C) to obtain dJ((), 
which is a probability measure on Then the question of the optimal choice 
of both q^ and J(£) reduces to the question of the optimal choice of J((). 
Hence, without loss of generality we can replace q^ with q^. For this rea- 
son, we should choose q^ = q^ and focus on the choice of Q^(-) and «/(•) for 
optimization. 

One reason we use the normal weight for in this paper is to facilitate 
a comparison with Andrews and Ploberger (1994). Using the normal prior 
with covariance matrix proportional to the efficient information matrix also 
leads to a significant simplification of the representation of the test statistics, 
since many terms cancel in the proof of Theorem 1. However we note that 
the choice of Q^(-) is not limited to the normal weight studied in this paper, 
as indicated in the proof of Theorem 2. More general choices of the priors 
Q((-) and «/(•) merit future consideration, but this is beyond the scope of 
the current paper. 

The optimality of the likelihood ratio statistics with loss of identifiabil- 
ity under the null for semiparametric models is of potential interest. Simi- 
lar to the likelihood ratio test under loss of identifiability with parametric 
models [Andrews and Ploberger (1994)], in the semiparametric setting, the 
profile likelihood ratio statistic is not of the optimal average exponential 
form. It can be shown to be a limit of an average exponential test, but 
only if a parameter is pushed beyond an admissible boundary, as noted by 
Andrews and Ploberger (1995) in the parametric case. 

3.3. The distributions of the test statistics under local alternatives. To 
gain insight into the power of the optimal tests in practice, it is worthwhile to 
study their asymptotic distributions under local alternatives. In the following 
two theorems, Theorem 3 gives the asymptotic distribution for fixed local al- 
ternatives , . _ , , , while Theorem 4 gives the asymptotic distri- 

bution for random local alternatives / dP^ ^- + ^ n _ 1/2 ^ ^.dQ^hp) dJ (Q . 
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As shown in the theorems, the distributions depend on the form of the alter- 
native, which will depend in part on the specifics of the application. These 
results also usually depend on the prior distributions J(-) and Q^(-), for both 
fixed alternatives and random alternatives, although in different manners. 

Theorem 3. Under local alternatives , , _ , ,„, , and assump- 

V> +Vv™+°( n ' Mi 

tions A-C, ELR n = EW n + o p (l) = ER n + o p (l) ^ d fx{ c ), wit h 



/ X (c) = (l + c)-P/ 2 



cxp 



2 1 + c 



x V(0o(C))Wo(C)) + ^(^,C,Ci)} 

w/iere C, Ci) = ^(^(O^otCi))' V 



dJ((), 



Now we establish the asymptotic distribution of the test statistics under 
the alternative / ^ +v>/s+o(n _ 1/2)iC dQ c (h ) dJ(Q- 

Theorem 4. Under assumptions A-C and the local alternative 

f dP i )+h/V K + o(n-^U d Qd h p) d J(<), ELR n = EW n + Op(l) = £fi n + 

o p (l) — ><i ^x(c), where r%(c) zs a reaZ random variable such that its cumula- 
tive distribution function Pr(rx(c) < i) = Po[l{ex(c) < i}ex(c)]. 



3.4. Monte Carlo computation and inference. Although we have obtained 
the asymptotic distributions of the test statistics, these distributions gener- 
ally have complicated analytic forms which depend on the values of unknown 
nuisance parameters. We now introduce a weighted bootstrap method to ob- 
tain the asymptotically valid critical values of ex(c). This method does not 
require explicit evaluation of the limiting distribution, thereby avoiding the 
numerical difficulties inherent in such an evaluation. 

We first generate n i.i.d. positive random variables k±, ■ . ■ ,K n , with mean 
< fi R < oo, variance < o 2 K < oo and with / °° y/P(n\ > u) du < oo. Next, 
we divide each weight by the sample average of the weights R, to obtain 
"standardized weights" K] 1 , . . . , K n which sum to n. For a real, measurable 
function /, define the weighted empirical measure P°/ = n _1 Ya=x K if(Xi)- 
Let V'n(C) = ifiniOiVniO) denote the maximizer of l n (ip,C) over ip & ^ at 
fixed C G S, where Z° is obtained by replacing P n with P° in the defini- 
tion of l n . Similarly, let V'o(C) = (^o(O)^o(C)) denote the maximizer of 
(Zn)°O0)C) over V> £ * a f fixed £ G 5, where (1^)° is obtained by replac- 
ing P n with P° in the definition of l n , the log likelihood under the null. 
Now repeat the bootstrap procedure a large number of times M n and com- 
pute the differences of the bootstrapped unrestricted MLE and restricted 
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MLE of : d(5° k {C) = /3° ;fe (C) — Po,k(0> k = l,... ,M n , as processes of £. Note 
that we are allowing the number of bootstraps to depend on n. Define 

C ^ An(C) = M' 1 Efe d^(C) and let 

C ^ K(C) = M- 1 E Wfc(C) - An(C))(^i°, fc (C) - An(C))'- 
fc=l 

To estimate critical values, we compute the standardized bootstrap test 
statistics 



K, k = (i + c)~^f e ^ v \l-2-{{dpi k {0 - An(C))' 



xy-^o^kCO-AnCC))} 



dJ(0, 



for 1 < /c < M n . For a test of size a, we compare the observed test statistics 
with the (1 — a)th quantile of the corresponding M n standardized bootstrap 
statistics. The reason we subtract off the mean is to ensure that we obtain a 
valid approximation to the null distribution when the null hypothesis may 
not be true. If not, then there may be loss of power, although the type I 
error rate will still be controlled when the null is true. The proof of the 
bootstrap validity can be built upon the proof of Theorems 7 and 8 in 
Kosorok and Song (2007). We omit the details. 



4. Examples. In this section, we study the two examples in the intro- 
duction to illustrate the two types of nonidentifiability settings, one where a 
nuisance parameter is present under the null and one where it is not. These 
examples demonstrate important differences in how the full rank reparam- 
eterizations and uniformly least favorable submodels are defined in the two 
settings. We present Example 2 first because a reparameterization is not 
required, simplifying the presentation. 

4.1. Example 2 revisited: Change-point regression for current status data. 
In the change-point Cox model with current status data, a test of the exis- 
tence of a threshold effect corresponds to a test of the null Hq : [5 = 0. The 
change-point parameter £ is present only under the alternative. Hence it 
suffices to take (f>^ as the identity map. 

We make the following assumptions and will argue that the assumptions 
in Section 2 can be checked under these assumptions. Given Z , T and V 
are independent, Z belongs to a compact subset of R. The change-point 
parameter £ G [a, b], for some known — oo < a < b < oo with Pr(Z < a) > 
and Pr(Z > b) > 0. Assume P(Var(Z|y)) > 0, which guarantees that, as 
we will show later, the efficient information Ig(#o(C)) 1S positive definite 
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Q(x;6) 



S— rr-rr- (!-*) 



uniformly over (gS. The Lebesgue density of V is positive and continuous 
on its support [a, r] with < er < r < oo. The baseline hazard function A 
is continuously differentiate at [a, r] , with derivative that is bounded away 
from and satisfies Ao(cr) > 0, Ao(r) < M, for some known M. We let 7i\ 
denote a set of nondecreasing cadlag functions A on [a, r] with A(r) < M. 

The likelihood function equals (2) with fy t z(v,z) removed, because it 
can be absorbed into the underlying measure on the sample space. The 
log-likelihood for a single observation log/i(#) takes the form log/i(#) = 
<51og[l - exp{-A(t>) x exp(r 7 (z))}] - (1 - 5)exp(r 1 (z))A(v). Define Z{Q 
[ I { /, > £}, Z\\Z > £}, Z) and note that with such a data representation we 
can adopt much material in the literature and hence simplify our arguments. 

To define a uniformly least-favorable submodel in f3, we take two steps. 
For Step 1, we calculate scores for £ and A. The score function for £ is 
i((x) = z(£)A(v)Q(x;6) with 

e -e r iWA(«) 

The score operator for A along A( = A + th with t > and h a nondecreasing 
nonnegative right continuous function, is given by 

d 

l A (h)(x) = — logp(x;7,A t )|t =0 = h(v)Q(x;6). 

We project 1%{X) onto the space generated by I a- That is, we need to find 
a function h^(V) £ H\ such that — l\(h^) _L lA.(h), for all h G Ha 5 which 
is equivalent to solving the least squares problem Pg\\l^ — lAh\\ 2 . The solu- 
tion under the null is h%(V) = A (V)h^(V), where hf = P(Z(()Q 2 (X; if)))/ 
P(Q 2 (X;9)), which is assumed to possess a version that is differentiable 
componentwise with the derivatives being bounded on [a, r] uniformly over 
£ £ H. It can be shown that At(0) is indeed a hazard function when t is 
sufficiently close to £. 

The uniformly least-favorable direction for £ is Af(0) = A + (£ — i)'<£>(A)/i£*o 

Aq 1 o A. Here 92 is a function mapping [0, M\ into [0, 00) such that <p(y) = y 
on [Ao (a), Aq(t)] and the function y^ip{y)/y is Lipschitz and <p{y) < 
c(min(y, M — y)) for a sufficiently large constant c. The efficient score for £ 
for this uniformly least-favorable submodel is given by: 



!^x;t,9) 



A f (0)(u)Q(x;t,A t (0)). 



A t (0)(*>) 

Aq 1 may be extended to [0, 00) by setting Aq 1 (u) = a for u < Aq((t) and 
Aq 1 (u) = t for it > A (t). 

For Step 2, we next project lp{x) onto the space generated by l^. The 
efficient score function for 0, lp, is the first two coordinates of minus 
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its projection on the remaining coordinates of lg. Since lies in a finite- 
dimensional space, the projection path has a matrix representation. The 
efficient information for £, 1^, can be partitioned as a two-by-two block 
matrix, with 1^(9) denoting its first two-by-two principle submatrix, and 
so on. We define v' e = {l,-(if)- l lf), and & (0) =£-(0- t)v e . We also 
define A t (0) = A + (&(0) - t)'(p(A)hf o A 1 o A. 

Now we use the uniformly least-favorable path 1 1— ► (£t (0), A 4 (0)) in the pa- 
rameter space for the nuisance parameter 77 = (a, A). This leads to l(t, (3, a, A) = 
log/(^(0), A t (0)). This submodel is least favorable at (£o> Ao) uniformly over 
( G 3 since d/dt\t=p l(t, Po, a, A) = f^g,. whereas z^/^ = lp. The efficient in- 
formation matrix for f3 is, Ig = I^ 1 — /| (l| 2 ) _1 /| (0). The remainder of 
assumption B4 can be verified by standard empirical process arguments. 

To verify assumption Al in Section 2, it suffices to show that 1^ is uni- 
formly positive definite over £ G 3, which can be achieved by checking that 
inf^gs A m i n {Po(Cov(Z(£)|l/))} > 0. We first show that the random vector (Z, 
1{Z > £}, Z1{Z > £}) is linearly independent given V pointwisely in £ G 3. 
Suppose that given V", 



a.s., for some constants o, 6 and c. Our aim is to show a = b = c = 0. When 
Z < C, (6) becomes aZl{Z < C} = 0. Since Var(Z\V) > and P(Z < (\V) > 
0, for every £ G 3, Var(Z|Z < (, V) > 0, and therefore a = 0. When Z > (, 
(6) becomes (b + cZ)l{Z > (} = 0. If c 7^ 0, Z = — 6/c, which is contradicted 
with the fact that Var(Z|Z > £, V) > 0. Thus we conclude that c = and 
b = as a consequence. That P(Cov(Z(£)| V)) is uniformly positive definite 
over £ G E follows since P(Cov(Z(£)|V)) is a continuous function of £ and 
3 is compact. 

The profile likelihood estimator ip n (C) can be shown to be consistent for 
(/3o, Ao) by a similar proof as used for the full maximum likelihood estimator 
in Huang (1996). The following lemma shows the uniform consistency of 
V>n(C) under the null. 

Lemma 3. - ip Q = o| o (l). 

To verify the uniform no-bias condition (4), we need the following result 
about the uniform rate of convergence. 

Lemma 4. Suppose that d(r],rji) : 77,771 G TC V is the metric defined on Ti^, 
and C\, C2 and C3 are positive constants with, 



(6) 



aZ + bl{Z > C} + cZ\{Z > C} = 0, 



(7) 
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and 

(8) Po sup \G n {mp^ -mp, m ,<:)\ < C 3 c/) n (6), 

/3£B, V £H v ,\\f3-f3 \\<S,d(v,Vo)<8,teZ 

for functions 4> n such that 5 \— > (j) n (5)/5 a is decreasing for some a < 2 and 
sets B x 7i v x S such that under the null Pr(/j n G B,f)%(() E TC V ,C 6S)-> 

1. Then sup CeS r n d(^ n (C),»yo) < Op (l + r n ||^ n - A) II) / or a"2/ sequence of 
positive numbers r n such that r^(f> n (l/r n ) < s/n for every n. 

We apply Lemma 4 with 77 = (a, A), Ti^ = lx Ha, where Ha is the closed 
linear span of Ha, d(r],r)i) = \\a — a±\\ + ||A — A1H2 and 

log^^-, if 77 = 770 , 

Pf3o ,110 

~ 1 21og P ^ + p ft"» , otherwise. 
2 PPo,vo 

Condition (7) can be established by the Taylor expansion and the uniform 
boundedness on the derivatives of the loglikelihood. Condition (8) can be 
verified using Lemma 3.3 of Murphy and van der Vaart (1999), with the 
choice <f) n (5) = 5 l / 2 {l + M<T 3 / 2 />/™)> wh ere M > \\ 

m f3,r],^\\oo is a constant. 

These conditions imply that \\a^ (() — ckq || + || As (£) — A0II2 = 0^(\\(3 n — 
A)|| + n -1 / 3 ), for any sequence /3 n — ► 0. Now we only need to verify 

(9) P £(p ,Po,fjp n (0,0 =o%(\\Pn -Po\\ +n- 1 ' 2 ), 

which is equivalent to (4) under regularity conditions. We further decompose 
(9) as (17) in Murphy and van der Vaart (2000), which can be easily verified 
by the Taylor expansion and the uniform boundedness on the first and second 
derivatives of the loglikelihood. 

It is not difficult to see that {p£,a(C)} is differentiable in quadratic mean 
at (V'cbC) with respect to the set of directions {£0 + th±, Aq + t/i2}, where 
h± E M 3 , and /12 is a nondecreasing nonnegative right continuous function. 
Thus all conditions in Section 2 are satisfied. 



4.2. Example 1 revisited: Univariate frailty regression under right censor- 
ing. The odds-rate model we consider in this paper posits that the hazard 
function has the form (1). We define (s) = (1 + ^s) -1 ^, for £ > 0, and 
go{s) = lim^offc ( s ) = exp(— s). Let Sz(-) denote the survival function of T 
given Z, and after integrating over W, Sz(t) becomes g^ (f Q e^' z<yU ^ dr](u)), 
where the cumulative baseline hazard function r/(-) is a nonnegative, mono- 
tone increasing cadlag (right-continuous with left-hand limits) function. We 
will argue later that assumptions A-C can be checked under the following 
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conditions. The true null survival function is unique and denoted as Sq. The 
censoring time C is independent of T given Z and uninformative of £ and 
j3. Moreover, for a finite time point r, P$l{C >t} = Pq1{C = t} > almost 
surely. £ G S = [0, Kq] for some known Kq < oo. The null value (3q = is an 
interior point of a known compact set Bq £P, The parameter space for 
rj, Tirf, is a Banach space consisting of continuous and monotone increas- 
ing functions on the interval [0, r] equipped with the total variation norm 
|| • ||„. Its closed linear span is denoted as 7i v . The function r/(-) G TC^ satisfies 
77(0) = and r}(r) < 00. The covariate process Z{-) is uniformly bounded in 
total variation on [0,r] and var[Z(0+)] is positive definite. 

The true values of tt = (77, £) are not unique under the null, since the null 
set rio contains all pairs of (77, () satisfying, for t G [0,r], (1 + C,T]{t))~ x ^ = 
So(t), when ( G (0, Kq]; and exp(— rj(t)) = So(t), when £ = 0. In this example, 
C appears both under the null and the alternative. Equivalently, for any fixed 
C G (0,K ], Vo(t)(Q = (S (t)-< - and for ( = 0, V o(t)(() = - log(S (f)), 
t G [0, t] . Hence LIo = {((, 770(C)) : C 6 S}. Thus we need a suitable parameter 
transformation for this example. Let 77 = ^(77) = (1 + 7/C) 1 ^ — 1, for £ > 0; 
and 77 = lim^^o <^c (^) = ^pC 7 ?) — 1. It can be easily checked that 77 G Ti n . 
The following arguments reveal that the map ^(77) :7i v 1— > 7^ is a full-rank 
reparameterization. 

The log likelihood function with the new parameter 9 = (f3, rj, Q is 



(10) 



6{lo gai {v) + (C - 1) log{fj(v) + 1)} + pz(y) 
+ (l + 5() log^jjT e?<'\n{a) + l)^ 1 drj(s) 



where ai(-) is the derivative of fj(-). We will replace ai(-) with nArj(-) in the 
sequel, since this form of the empirical log-likelihood function is asymptoti- 
cally equal to the true log-likelihood function. When (3 = 0, it is clear that 
C vanishes since (10) = P n {<51og Af](v) — (5 + 1) log(l + 77(f))}, and 77(0) = 0. 
The odds-rate model with new parameterization ip = (/3,rj) is identifiable un- 
der the null, since the null survival function So(t\z) = (1 + is a strictly 
monotone function of 77 and is unique. 

The Gateaux derivative of ^(77) at 77 G TL^ exists and is obtained by 
differentiating ^(77) along the submodels 1 1— ► 77 + th. This derivative is 
Mv)( h ) = d/dt(p ( (7] + th)\ t =o = (1 + Crifl^h for C > and exp(? ? )/i for 
C = 0. 

The Gateaux differentiability of 4>((i]) pointwisely in £ can be strength- 
ened to uniform Prechet differentiability by noticing that 



lim sup sup 







{<P c (rj + sth(()) - <P c (ri)} ds 



0. 
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for any r > 0. Thus ewp^sup^^f^ ||0 f (77 + h(()) - ^(77) - ^(77) x 

(MC))l|u/||MC)lk = °(1)> as IIMOIIu — * uniformly over £ E S, which we will 
hereafter refer to as "uniformly Frechet differ entiable." Since tf>^(rj)(h) is 
uniformly bounded and Lipschitz in h, by checking the definition, we can 
show that (f)(^ is one-to-one and continuously invertible uniformly over (GS. 

To define a uniformly least-favorable submodel, we calculate scores for 
(3 and fj. Let 7i denote the space of elements h = (hi, hi) such that hi G W 
and /i2 £ 'Hti- Consider the one-dimensional submodel defined by the map 

t \— > i]) t = ip + t(/ii,/g ^ h2(u) drj(u)) , h eH. The derivative of log£ n (^,£) 
with respect to i evaluated at t = yields score operators £ n (ip,C)(h) = 
(inp(hi),£nrj(h2)), where 

= P re i /3 (/n) 



Sh'^X) 



(1 + *C) 



and 



-(i + *C) 

x ry(u)e^W(77(«) + l) C - 2 



(C-l)/ /i 2 (s)^(s) + /i2(«)(l+^(«)) 



drj(u) 



x (\ + C^V(u)e^ z(u) (^(u) + l) C_1 ^(u)) 



with y(u) = l{y>n}. 

To obtain the information operator, we consider the two-dimensional sub- 
model defined by the map (s,t) \-> ip st = ip + Jq ^ h2(u) ctrj(u)) + t(^i, 
Jo ^ ^2( u ) drj(u)), where h,h 6H. Define Woo = {h £Tt: \\h\\-n < 00}. The in- 
formation operator Wg(h) :TLoo ^Ti-oo is given by — Pod/dsdt£i(ip st )\ s ^=o = 
ip(Wa(h)). We will show 0S7 is one-to-one, continuously invertible and onto 
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uniformly over £ G 3, via Part (1) of Lemma 7 in the Section 7 for which 
it suffices to show that the information operator for the original parame- 
terization gq is one-to-one, continuously invertible and onto uniformly over 

Cg~. 

With the same derivation of a-g , a$ : TL^ i— ► takes the form 
where 

offa) = -P S{6) i T h! 1 Z{u)Y{u)eP z tod<n Q {u), 
Jo 

af(h 2 ) = -P S(6) [ T h 2 (u)Z{u)Y{uy' z ^ d Vo (u), 
Jo 

of (h) = -P S(9)(1 + (r,(T A T)(0)KZ(u)Y(u)e^ z ^ 

- (PoS(e)Y(u) f T h[Z(u)Y(u)e^ Z ^ d m (u), 

Jo 

of (he) = -P S(0)(1 + (r,(T A r){Q)h 2 (u)Y [u^' z ^ 

- (PoS(9)Y(u) f T h 2 (u)Y(u)e^' z ^ d m (u) , 

Jo 

with S(6) = -(1 + 5C)/(1 + Cv(t)) 2 . 

All of the operators a£ , 1 < i,j < 2 are uniformly compact and bounded 
over £ G S. With a similar argument as in Kosorok, Lee and Fine (2004), 
the linear operator og : Ji^ h- > Hoo is one-to-one, continuously invertible 
and onto uniformly over ( G S by verifying the conditions of Lemma 8 
in the Section 7. Thus a uniformly least-favorable submodel for estimat- 
ing /3 in the presence of rj and ( is Tj t (f3,r}X) — (1 + (P — tY^dr], where 
!^:Mh >W is the uniformly least-favorable direction at (Po,r},C) defined by 
h'v e -=(of)- 1 afh, heW. This leads to £(t,j3,rj,Q = £i((3,rj t (9),()- Be- 
canserjp(/3,r},() = r?, Blis satisfied. Since d/dt\ t =p £(t,/3 , rj , C) = £p{Po,il>o,0 = 
ip(ipQ, C)j where ^(x) = £p — is the efficient score for /?, B2 is satisfied 
due to the continuity of the involved functions with respect to if) and the 
fact that 3 is compact. The efficient information for /3 is Ip = Poipi'n. That 

{£(t , C) : t G E7, ^ G V, C G 3} is P -Donsker and ^, C) : t G 17, ^ G V, ( G 
3} is Po-Glivenko-Cantelli for some neighborhoods U and V follows from 
standard empirical process arguments. 

It follows from Corollary 8.1.3 in Golub and Van Loan (1983) that the set 
of eigenvalues is a continuous function of the elements of Ip(9), which are 
continuous functions of £. The set of eigenvalues is therefore a continuous 
function of (. Thus inf^ A m i n { jg(#o(C))} > by the compactness of H, and 
assumption Al is satisfied. 
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The consistency of the restricted MLE ip and the uniform consistency 
of the unrestricted MLE ip n (C) can be established via the self-consistency 
equation approach, with arguments similar to the proof of Theorem 3 in 
Kosorok, Lee and Fine (2004). We omit the details. To verify the uniform 
no-bias condition (4), it suffices to show that 

- Police = 0%(W n -A,|| +n- 1 ' 2 ) 

for any sequence j3 n — ► (3q , 

where "*" denotes outer probability. By verifying conditions in Lemma 9 in 
the Section 7, we have 

sup(P n - Po){%(^n,%JC), " %(A),% C)} = op^ 1 ' 2 ). 

Together with the fact that F n ^0 n ,fjp n ((),() = P %(/?o,%> C) = 0, we ob- 
tain 

Po{%(/3n,% n (C),C)-%(/3o,r/o,C)} 

= Po%(^n,% n (C),C)-Pn%(/3 n ,% n (C),C) 

= -(P n - P )^((3 ,rj , C) + op^ 1 ' 2 ), 

uniformly over Q G H. 

Let Z^(fo) = {lp{hi),l rj {h2)) denote the score operator of tp with the original 
parameterization. It was shown in Kosorok, Lee and Fine (2004) that the 
operator -0 i — is Frechet differentiable with derivative ^(<7g(/i)), and it can 
be strengthened to uniform Frechet differentiability due to the smoothness of 
the involved functions. Since <f>^ is uniformly Frechet differentiable, by Part 
(2) of Lemma 7, the chain rule for uniform Frechet differentiability, l-^ = 
(£[3,£rj) is uniformly Frechet differentiable with derivative cr^-i^ o^ 1 (0). 

By the uniform Frechet differentiability of 1^, 

^(/3n:%„(C) -%) = P o {%0i,% n (C),C) - %(/?o,%,C)} 

+ Op (||/3n- A)|| + ll%„(0 — »7olloo). 

Since a-g is linear, the first term on the right-hand side is of the order 
Op^n- 1 / 2 ). It follows that sup CeH 117/^(0-% II oo = O^ (||/3 n -/3 1| +n" 1 /2), 
since a-* is uniformly continuously invertible over £ E S. 
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5. Simulation results. This section presents simulation results regard- 
ing the finite-sample properties of the proposed optimal test statistics for 
Example 2, the change-point Cox model with current status data. The sim- 
ulation study was designed with several objectives. First, we demonstrate 
how to compute the asymptotic critical values with the proposed weighted 
bootstrap procedure. Second, we analyze the empirical type I error of the 
proposed tests and compare with the nominal size of the tests. Third, we 
compare the power of the optimal tests with that of other tests such as the 
sup score statistics (equivalent to the likelihood ratio statistic) and some 
naive (pointwise) tests under several different alternatives. Fourth, we eval- 
uate the sensitivity of the power of the optimal test to the choice of c under 
several different alternatives. 

A single time-independent covariate Z with a uniform [0, 1] distribution 
was used. The threshold covariate Y = Z. The parameter a was set at ao = 

0. with the cumulative baseline hazards Ao(t) = 3t 2 . The censoring time 
was uniformly distributed on the interval [0,5]. This resulted in a censoring 
rate of about 25% under the null hypothesis. Under the alternative, we set 
Pio = —0.5, P20 G {—0.3,-0.5,-0.8}. The range of foo values reflects the 
distance from the null. We consider the following alternative distributions 
of C 

1. The weight J(£) degenerates to one point at 0.5, that is £ = 0.5. 

2. A uniform weight J(C) with support on [0.05,0.95]. 

For all the scenarios, we compute the optimal tests with a uniform weight 
on [0.05,0.95]. The sample size for each simulated data set was 300. For each 
simulated data set, 250 bootstraps were generated with standard exponen- 
tial weights truncated at 5, to compute the critical values for R n (Q, the 
naive score statistic at several £ values, sup^i? n (C), the sup score statistic 
and ER n , the weighted exponential score statistic. We take c = 0,0.5, 1,3 
and 00, respectively. Each scenario was replicated 1000 times. To compute 
the restricted MLE under the null, we use the iterative convex minorant 
algorithm. Empirical type I error and power results for selected subsets of 
the test statistics described above are provided in Table 1. 

We now make several general comments on the simulation results. The 
empirical type I error for all the tests is quite close to the nominal level. 
When the alternative distribution of £ is correctly specified, the optimal test 
is notably more powerful than the sup score statistic and naive tests. When 
the true alternative distribution of £ degenerates to one point, although the 
weighted exponential tests are no longer optimal, the empirical powers are 
still superior to the naive tests with misspecified £. We also observe that the 
empirical power of the sup score statistics is comparable to that of the naive 
test at the true £, which may be due to the fast convergence rate of the 
change-point estimator. For all the alternatives considered, the empirical 
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Table 1 

The empirical type I error and power of the proposed tests, sample size n = 300, 1000 
simulations, with bootstrap size 250. The worst case Monte Carlo error for table entries is 
0.016. The Monte Carlo error is 0.001 and 0.009 for empirical type I error with nominal 
size 0.05 and 0.1, respectively. The empirical power results are based on size 0.05 tests 



Empirical type I error 

„ . , Weighted exponential tests, 


c = 


Sup score 


Naive tests R n 


(C), c = 


size 0.5 


1 


3 


00 


0.3 


0.9 


0.5 


J(0 ~Uniform[0.05,0.95] 
0.05 0.056 0.057 
0.10 0.098 0.103 


0.045 
0.109 


0.063 
0.095 


0.046 
0.099 


0.058 
0.085 


0.043 
0.112 


0.044 
0.103 


0.039 
0.100 


Empirical power 

_ Weighted exponential tests, 


c = 


Sup score 


Naive tests R n 


(C),C = 


alternative 0.5 


1 


3 


00 


0.3 


0.9 


0.5 


J(C) ~ Uniform [0.05, 0.95] 
















C = 0.5 
















77 = -0.3 0.646 0.647 


0.653 


0.653 


0.656 


0.688 


0.243 


0.044 


0.692 


7) = -0.5 0.835 0.833 


0.839 


0.845 


0.847 


0.865 


0.616 


0.076 


0.840 


77 = -0.8 0.922 0.925 


0.928 


0.928 


0.928 


0.968 


0.957 


0.174 


0.942 


J(0 ~ Uniform [0.05, 0.95] 
















n = -0.3 0.320 0.320 


0.320 


0.320 


0.312 


0.211 


0.133 


0.055 


0.142 


77 = -0.5 0.485 0.488 


0.492 


0.494 


0.500 


0.405 


0.258 


0.083 


0.272 


T) = -0.8 0.748 0.757 


0.763 


0.768 


0.769 


0.605 


0.494 


0.183 


0.413 



power of the weighted exponential tests seems to increase as c increases. 
However, the trend is rather weak. In many cases, the difference in power is 
less than 0.01. This suggests that the direction of the test (specifically, least 
favorable curve in this paper), rather than the scale of the curve, is most 
critical for the power of the weighted exponential test. 

6. Discussion. In this paper, we consider tests of hypotheses when the 
parameters are not identifiable under the null in semiparametric models. 
Our optimality results apply to a large class of semiparametric testing prob- 
lems under loss of identifiability, where nuisance parameters may not be 
root-ra estimable either under the null or alternative. We note that our cur- 
rent regularity conditions are not directly applicable for testing under loss of 
inevitability when the parameter of interest is not root-n estimable. One ex- 
ample is testing for homogeneity in mixture models, where the usual first or- 
der Taylor approximation may not be possible [Chen, Chen and Kalbfleisch 
(2004); Chernoff and Lander (1995); Dacunha-Castelle and Gassiat (1999); 
Lindsay (1995); Liu and Shao (2003)]. A higher order expansion is required. 
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Although not directly covered by our framework, the homogeneity tests may 
possess a uniform quadratic expansion [Zhu and Zhang (2006)], thus permit- 
ting a generalization of our results to general quadratic expansions. In the 
following, we conclude the paper with a brief discussion of this generaliza- 
tion. 

To be concrete, let us consider a two-component mixture with density 
g(p,Pi,P2,rj) = Pf(PWn) + (1 - p)f(P2,ri), where f(fi,rf) is a parametric 
p.d.f. with parameters /i£l p , ijel', such as a location-scale family. Let 
(3 = fj,2 — Pi, 6 = (p, P' , Pi,rf)' , and the hypothesis of interest is /? = 0; that 
is, there is only a single component in the mixture. For convenience, we 
assume the mixing proportion p G (0, 1] and p\ = pi = po under the null. 

In this example, p is not identifiable and p± and p2 are mutually indistin- 
guishable under the null. Simple algebra shows that the information matrix 
for tjj = (/?, /xi) is singular under the null, for arbitrary values of p, which cor- 
responds to the fact that p\ and pi are not root-n estimable [Chen and Chen 
(2003); Zhu and Zhang (2004)]. We consider the following reparameteriza- 
tion: 71= (1- p)(fi 1 -po)+p(p2~Po) and IT = (I- p)(m - p ) 2 + p(p 2 ~ Po) 2 , 
which can be considered as "mixed mean" and "mixed variance" . Let (3 = 
(jl, v) and ip = (71, v, rj) . We can establish the identifiability of ip and the con- 
sistency and the root-n rate of the MLE of ip under the null. Furthermore, 
under a set of assumptions on the parameter space [e.g., the cone condition 
in Andrews (1999, 2001)] and the stochastic differentiability and equicon- 
tinuity of the involved functions, we can establish the following quadratic 
expansion of the loglikelihood with respect to tp: 

L n {ip, C) = L n (tp , C) + 00 - 4>o)' s n<;(^o) 

+ i(^-^ ) / J B c (^ )(^-^ ) + r n (C), 

where r n (C) = o^(l), and SVicO) and B^(-) are different from but similar in 
structure to the score and information processes for tp indexed by £. 

When the nuisance parameter r/ is not present, a similar weight as in 
the current paper for ip can be chosen as Q^(-) = (•)■ The corresponding 
weighted exponential tests are still optimal in the Neyman-Pearson sense. If 
rj is present, a uniformly least favorable curve for this quadratic expansion 
with respect to (3 would need to be characterized. This is beyond the scope 
of the current paper but is an interesting topic for future research. 

7. Proofs. 

Proof of Lemma 1. Since is linear, continuously invertible and 
one-to-one, the tangent set for 7] and rj are identical. By the chain rule, 
%(7) = Iri^ (t) f° r any 7 in the tangent set of fj. The efficient score for 
(3 with the parameter (/?, 77, £) is: lp(/3, rj, C) = (I — Z r? (Z*Z T? ) —1 Z*)Z j g('0, Q and 
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with the parameter ((3,r),Q i s: {I — %(^r%) O^3C0> 0- The efficient score 
function is invariant under such reparameterizations since 

/ - irfifyrf)- 1 ^, = 1- i^^^ip^r 1 ^;^, C) 
=i-i v (t r] i r] r i t v ^x), 

and £/3(ip,C) = IpfyiC)- That the efficient information matrix is invariant 
under reparameterizations thus follows from its definition. □ 

Proof of Lemma 2. It suffices to show that under the full rank repa- 
rameterization, for any random sequence (3 n — >p /?o, 

log P e n (p n , C) = logp4(/3 , C) + nifin - PoYFjp^, C) 

(11) - \n(P n - P O )%@O,O0n ~ Po) 

+ o| (v^||/3n-/3o|| + l) 2 . 

By assumptions B2, B4 and the dominated convergence theorem, for ev- 
ery (tj,fj) - (ft,/%,77 ) ^ °' we haw W&M.C) ~Wo,C)) 2 = o S (l). 
Similarly, we have Po£(i,P,rj,() — PoKflo,Po,Vo,() = The derivative 

of the function 1 i— ► log £(t, ip , Q satisfies Pq£((3o, ip , Q = — ip(ipQ, Q. These 
facts, together with the empirical process conditions, imply that for every 
random sequence (t,P,rj) -> (Po,Po,Vo), G n £(i,(3,rf) - G n ^(^ ,C) = °l (l) 
and ¥ n £(t, $,rj, () +Ip(ip Q , £) = of, (I). The subsequent steps of the proof are 
similar to those used in the proof of Theorem 1 in Murphy and van der Vaart 
(2000), and we omit the details. □ 

Proof of Theorem 1. The proof takes several steps. We first show the 
asymptotic equivalence of these statistics, which is summarized in 
Lemma 5 below. With a small abuse of notation, let PLR n = / pl n (P + 
h/yfii,C)dQdh)dJ(Q/ 

pl n (PoX)- This is the profile likelihood ratio of the alternative over the null 
and it can be approximated by 

PLR n = J exp{pJ0 o (C))%(0o(C)R(#o(C))} 

x i p (e (c))(AMO) - h)}dQ c (h)dJ(c), 
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with the linear statistic /3 n (#o(C)) = V^Ip 1 (^o(C))^n^(^o(C))- An approxi- 
mate exponential Wald statistic EW n is defined as 

EW n = (l + c)-P/ 2 J expQ^W^C))^), 

where W n (() = ^ (C))%(#o(C)R(#o(0). 

Now we show the asymptotic distribution of these tests under the null 

hypothesis. Assume without loss of generality that f3 n and ip n take their 
values in U and V as defined in assumption B4, respectively. Following 

Lemma 3.2 in Murphy and van der Vaart (1997), we have G n (£([3 n ,ip n X) — 
£/3(ipo, C)) — >Po 0. Thus £p(ip0i C) — h(@o(0) is -Po-Donsker as a class indexed 
by (" G S and EW n — >d ex( c ) by the continuous mapping theorem. Lemma 
5 below then gives the desired results of Theorem 1. □ 

Lemma 5. Under the null hypothesis and assumptions A-C, (1) PLR n — 



PLR n ^ Po 0, (2) PLR n = EW n , (3) EW n -EW n ^p 0, (4) EW n — ER n -^ Po 
and (5) ER n - ELR n -^p 0. 

Proof. For notational simplicity, let /3 n = /3 n (#o(C)) an d Iq = -f/?(#o(C))- 
We first show (1). For < M < oo, define 



PLR n {M)= / plniPo + h/^OdQ^dJiO/plniPoX), 

J(£E J\\h\\<M 



and 



PLR n {M) = f exp0Jf>P- 



<\\h\\<M 

Note that for any M > 0, 



C6; 

exp(-i(^ n - h)'W n - h)) dQ ( (h)dJ((). 



| PLR n - PLR n \ < \PLR n - PLR n (M)\ + \ PLR n (M) - PLR n (M)\ 



+ \PLR n - PLR n {M)\. 



Hence it suffices to show that (i) \PLR n - PLR n (M)\ -^ Pa 0, (ii) \PLR n - 
PLR n {M)\ -> Fo and (iii) \PLR n {M) -P~LR n (M)\ ^ Po 0, as n -> oo and 
VM : < M < oo. To show (i), for any e > 0, 

Pv(\PLR n - PLR n {M)\ > e) 

<e^P Q \PLR n -PLR n {M)\ 

(12) =e-W / PW ] + ( h '^° d Qd h)d m 
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(13) < £ -W / A , ( ti C) A C) dQdh)dJ(() 

(14) -e"^/ [ (l + o p (l))dQ c (h)dJ(() 

^Ces J\\h\\>M 

(15) =e~ x f I dQ c (h)dJ(()+o(l), 

J Ce3 J \\h\\>M 

where (12) uses assumption C and (13) holds by definition of the profile 
likelihood. (14) holds by assumption B3 and Lemma 2. (15) holds by Fubini's 
theorem. The right-hand side of (15) can be made arbitrarily small for all n 
by taking M large enough, since is a uniformly tight measure. 
For (ii), we have 



\PLR n — PLR n {M)\ 

= I exp(ip n / /3 (e (C)) / /o" 1 IP ) n/>(eo(C))) 

JCeB 

(16) 

exp(-i(/5 n - h)'i o (0 n - h))dQ c (h)dJ(£) 



>\\h\\>M 

<expfi sup \\fjp(e (0) II 2 sup WIq 1 ]]) f f dQ ( (h)dJ((). 
V ces ce= / J Ces J \\h\\>M 

In the inequality, ||P n Z ( g(6 l o(C))|| 2 = Op (I) follows from assumption B4. The 
fact that H-^ 1 !! = Op (l) follows from assumption A. The last term 
Ices I\\h\\>M dQ((h) dJ(() — ► 0, as M — > oo. Hence (16) = o p (l), as M — > oo. 

Now we show (iii). For contiguous sequences (5q + hj^fn^p^ flo and 
\\h\\ < M, Lemma 2 yields the following expansion of the profile likelihood 
under the null: 

logp/ n (/? + h/^i, C) = logjrf B (#),C) + VntiVnipWoiQ) ~ ¥~hh + o%(l) 

= 0JoP n -Wn- h)'Wn ~h)+ 0%(1), 

therefore, 

PLR n (M) = ff (pl n (p + h/VZ, - pl n (M) d Qd h ) dJ (0 

J J\\h\\<M 
\\h\\<M 

- \(fi n - h)'l (p n -h) + of (1)) dQ c (h) dJ(Q 
PLR n (M) + Op(l) 
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where the last equality follows from PLR n (M) = O p (l), by arguments anal- 
ogous to those used in (16) above. The proof for Part (1) is now completed. 
For Part (2), since h~Q ( = N^cl^ 1 ), 



PLR n = UC)dJ(C), 



with 



(2vr)- p / 2 det 1/2 (/ /c) 
x / exp 



U{3'Jof3 n ~(h- (3 n )'Uh - 0J - ^ 



dh 



1 c ~ ^ 



= (i +c )-^ex P ^-— p n hp n y 

where the last equality holds by integrating out a normal density. 

For Part (3), it follows from Lemma 2 and assumption B3 that \fn\\j3 n (C) — 
Po\\ = Of> (l), and reapplication of Lemma 2 and the argmax theorem yields 

v^(/3n(C) -Po) =U0 (C)r 1 ^J/3(0 (C)) + 4,(1). Part (3) now follows. 

For the proof of Part (4) and Part (5), it suffices to show that W n (C,) — 
R n (C) = °p (l) an d Rn(() — LR n (() = Op Q (l). These results follow from 
Donsker properties and standard arguments. We omit the details. The proof 
of Lemma 5 is thus complete. □ 

Proof of Corollary 1. The proof is similar to the proof of Theorem 
1. We omit the details. □ 



Proof of Corollary 2. The proof follows the same lines as the proof 
of Part (2)(iii) of Lemma 5, with 



^(g c ,C) = (2vr)-^ 2 det 1 / 2 ('i±^/ 



(17) 



x / exp 



1 + c 
2c 



A 



3 ;P n ) M A 



3 -Pr 



1 + C V V 1 + c 



dX, 



where det is the determinant of a matrix, (•,•)»? is the inner product de- 
fined on Tirj, and W(q^, Q < 1 since — <j£, Pl*l v (q^ — q^)')n is nonnegative 
definite. □ 
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Lemma 6. Under assumptions A-C, the densities i n (^ + h/y/n,£) and 
J £ n (^|J + h/^/n,C) dQ^(h) dJ(Q are contiguous to the densities Z°. As a con- 
sequence, the results of Lemma 5 still hold under local alternatives {P^ +h , ^} 

and {J P$ o+h /^£ 

Proof. Assumption C implies that a LAN (local asymptotic normal) 
expansion for the log-likelihood ratio holds immediately by Lemma 3.10.11 
of van der Vaart and Wellner (1996): 

/dP n \ 1 n 1 



It follows from LAN that A nC W where W c ~ N(-l/2\\A c h\\ 2 , \\A c h\\ 2 ), 
under Pq. Therefore, under Pq, 

dP- 

exp(A n? ) = ^expW c . 

P()(exp(W^)) = 1, using the formula for the moment generating function 
of the normal distribution. By Le Cam's first lemma [van der Vaart (1996), 
page 88], we conclude that the sequences of probability measures {P^ o+h /^ ^} 
and {Po} are contiguous, for every (£5. Consequently the convergence in 
probability that holds under Po also holds under {P^ +h ^^^} and vice 
versa. Similarly, since P(e%) = 1 using the formula for the moment gen- 
erating function of the x 2 distribution, we conclude that the sequences 

{/ P2 J 0+h /^x dQ ^ h) dJ(c)} and p °" are conti g uous - D 

Proof of Theorem 2. We define a y^-neighborhood of (3q as a col- 
lection of sequences n (hp) = 0o + hp/yfn + o(n -1 / 2 ), for hp G M. p . A y/n 
neighborhood of n is similarly defined as r] n (h v ) = r\ + h Tj j \fn + o(n~ 1 / 2 ) , for 
h v G Tirj. With a minor abuse of notation, a local form of the hypotheses can 
be written as: 

(18) H : ip = 4> vs. Hi:il> = ip -\-hi/^, 

where h\ G M p x 7^ takes the value (hpi, h v i), with /i^i = q'^hpi. We note 
that the least favorable direction is invariant under the choice of and, 
as a consequence, the contiguous alternative H\ is also invariant under the 
choice of 
Define 

nQ , rR _ /4(?o + hiA/^C)rfQc(M^(C) 

(19) ^-« n = . 
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(21) 



A test defined by LR n is 

!1, if LR n > k an , 

X n , if LR n = k an , 

0, if LR n < k an , 

where k an > 0, A n G [0, 1] are constants such that the rejection probability is 

a under the null. For notational simplicity, let P™ = / P~ +h ^ dQ^ (hpi) dJ(Q. 

By the Neyman-Pearson lemma, for all n > 1 and any test n with level a, 
with a minor abuse of notation, 

lim f cj> n [ f £ n @ + h/^OdQcih^dJiC)) dP[ l 
(20) ™' U J 

<Jim 1 7 n |y e^o + ht/^OdQdhpJdJioldP? 

= lim / I(Zi? n > k an ) 

x {/ 4(^ + ^i/v^,C)dQ c (V)^(C)}dA n 

(22) = ,&/{/ /(^>^n)^| o+ , l/ViiC }rfQc(V)^(C) 

(23) =n lh ^J{! I(EW n >k an )dP^ +hi/v -^dQ c (h pi )dJ(0, 

where (21) follows since LR n has an absolutely continuous asymptotic dis- 
tribution under the contiguous alternative H\ and by Fubini's theorem. 
(22) follows since PLR n — LR n = op(l) under Hi, which will be estab- 
lished at the end of the proof. (23) follows from Lemma 6. The results for 
ER n and ELR n also follow from Lemma 6. By Fubini's theorem, we obtain 
limsup n ^ oo /{^ n (P| 0+?ii/ ^ c )}dQ c (V)^(C) < lim*-*, ${$ I{E W n > 
k an )dP^ — }dQc,(hp\)dJ(Q, which implies that the proposed tests 

have the greatest weighted average power asymptotically in the class of all 
tests of asymptotic significance level a, against the alternative P? 

To show PLR n — LR n = op ( 1 ) under H\ , it suffices to show PLR n — LR n = 
op(l) under the null by Lemma 6. Define LR n (M) = /^ e = i||ft,||<M 

hi/Vn,C)dQ i (h)dJ(C)/^n(^ X), and note that VM:0 < M < oo, \PLR n - 
LR n \ < \PLR n - PLR n (M)\ + \PLR n (M) - LR n (M)\ + \LR n — LRn(M)\. 
Hence it suffices to show that: (i) \PLR n — PLR n (M)\ -^p 0, (ii) \LR n — 
~LR n (M)\ -+ Po and (iii) \PLR n (M) - ~LR n (M)\ ^ Po 0, as ra -> oo. Part (i) 
was shown in Lemma 5. Part (ii) can be similarly established by taking M 
large enough and using assumption A. 
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To show Part (iii) , we take Taylor expansion of log £ n (tp + h\/ \/n, C) at 
(ipo,() with respect to hp along the direction q^, which leads to the following 
expansion in the least favorable submodel: 

log 4 (?o + ^ C) = log + y/nh'pVJiPorfo, C) 

On the right-hand side, we can replace F n £((3o, tpo, C) by ^n&pi^Q, C) + °p (l)> 
and F n £([],ip,() by — /^(V'oiC) +°p (1)> according to assumption B2. Com- 
paring the above display and Lemma 2 with /3 n = hpi/y/n, we obtain Part 
(iii). □ 



Proof of Theorem 3. The equivalence of the three tests under lo- 
cal alternatives is shown in Lemma 6. To show their asymptotic distribu- 
tion, a key step is to establish that 3„ converges under , , , in dis- 

n iPo+h/y/n,Ci 

tribution to the process £ i— > G(6*o(C)) + v*(hp, C 5 Ci)j where ^(/i^, Ci) = 
^o^(#o(C))^(#o(Ci))'V by Theorem 3.10.12 in van der Vaart and Wellner 
(1996). The result follows by Lemma 6 and the continuous mapping theorem. 

□ 



Proof of Theorem 4. The equivalence of the three tests under local 
alternatives is shown in Lemma 6. Since the sequences of densities / ln(ipo + 
h/y/n,C) dQ((h) dJ(() are contiguous to the density 1^, we have 

[ELR n , * 0+VV ^ ) ^(e X (cUx(c)), 

under P . Then ELR n -^ d rx(c) under / dP^ dQ ( {h) dJ((), by Le 

Cam's third lemma. □ 



Proof of Lemma 3. The proof mainly involves an argument that for 
an arbitrary, possibly random sequence {Cn} 5 the distance between the min- 
imizer of the Kullback-Leibler information and 9 n (Q n ) goes to zero. Con- 
sequently, the assertion of Lemma 3 follows from the arbitrariness of the 
sequence ( n and Slutsky's theorem. We omit the details. □ 



Proof of Lemma 4. The proof mainly involves a uniform "peeling de- 
vice" with an adaptation of the proof of Theorem 3.2 given in 
Murphy and van der Vaart (1999), which details we omit. □ 



OPTIMAL TESTS UNDER LOSS OF IDENTIFIABILITY 



35 



Lemma 7. (1) Assume Cli-> E^, C E is one-to-one, continu- 

ously invertible and onto and tp^:¥,^ cEhF is one-to-one, continuously 
invertible and onto, then ip^ o<^:B^ i— > F is one-to-one, continuously in- 
vertible and onto. (2) Assume <f>t : B^ cDh E^, C E is uniformly Frechet 
differentiable at 8 £ Dj, and ipr : E^, cEhF is uniformly Frechet differ- 
entiable at <Pq{0) over ( £ 5. Then tpt o c^rB^ i— > F is uniformly Frechet 
differentiable at 6 with derivative ?/^(<^(#)) o <p'^{0). 

Proof. For Part (1), it suffices to note that \\Mh (9))M (0)(h))\\ > 
ci||0 c (0)(/t)|| >cic 2 ||/i||. For Part (2), we note that, ^ c o C (6» + th) - tp c o 

M 9 ) = MM 9 ) + tk t) - MM e ))i where = iM 9 + #0 - M )}/ L So 

we rewrite the uniform Frechet difference as tp^((p((6 + h))(-) — ijj((4>((8))(-) = 

MM 9 ))(M 9 + h)- M 9 )) + o-(\\M 9 + h)- M 9 )\\) = MM 9 )) * 

M 9 )(h)+o E (\\h\\)- □ 

Lemma 8. Let A^ = T^ + : B \-> E 6e a linear operator between Banach 
spaces, where is onto and there exists c\ > 0, such that \\T^h\\ > ci||/i|| for 
all h GB and £ € H, and is uniformly compact, that is, U<;eH U||h||<i K^h 
is compact. Then if N(A^) = {0} for all £ E S, i/ien is onto and £/iere 
exists C2 > suc/i i/ia£ ||-A^/i|| > C2||/i||, V£ G 5 and a// /i G B. 

Proof. Since, for an arbitrary random sequence £ n , TjT 1 is continuous, 
the operator T^K :E i— ► B is compact. Hence / + T<T if^ n is one-to-one and 
therefore also onto be a result of Riesz for compact operators. Thus T^ n +K^ n 
is also onto. We will be done if we can show that I -\-T7 K^ n is continuously 

invertible, since that would imply that (T (n + i^J -1 = (I + 1 ) ~ 1 TJT 1 
is bounded. The remainder of the proof follows the proof of Lemma 6.17 in 
Kosorok (2008). □ 

Lemma 9. Suppose that U n (ip,()(h) = F n v(ipX)(h) and U(t/j,()(h) = 
Pv(tp,C)(h) for given P -measurable functions u(ijj,Q(h) indexed by ^ x H 
and an arbitrary index set Ti^. Assume ip = ij)Q, v{ipQ,C){h) = i/(ipo)(h) . If 
V'n(C) = V'o + Op(l), the class of functions {^(ip, ()(h) — v(ijjo)(h) : \\ip — ^o\\ < 
5, h G TC V , C G S} is P-Donsker for some 5 > and sup^ eS fcgWr) Po(v(ijj, C)(h) - 

v(tjjo)(h)) 2 ->• 0, as^^Vo, */ien sup CeS || y^^n - U)(ip n ((), () - y/n{U n - 
C/)(Vo, OH =0^(1 + ^114(0-^011). 

Proof. This is a "uniform" version of Lemma 3.3.5 in van der Vaart 
and Wellner (1996). Let ^ = {ip : ||i/> — ipo\\ < 5} and define an extraction 
function f:£°°(^fs x HxW,) x ^ 5 i-^ £°°(H V x S) as f(z,tp,()(h) = z(ifj,(,h), 
where z G ^°°(^'5 x 7^ x S). Since / is continuous at every point (z, f/Ji, 0> we 
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have sup /lgWi £ 6S C h) ~ zirfi, C> Ml ~~ ^ as ?/> — ► Vi- Define the stochas- 
tic process Z n (^,(,h) = G n (V(V>, C)(M ~~ ^(V'o, C)(M) indexed by f j x S x 
By assumptions, Z n converges weakly in i°°(^fg x E x 7^) to a tight 
Gaussian process with continuous sample paths with respect to the met- 
ric p c defined by ^((^1, C, ^l), (tp2, C> h 2 )) = P(u(ipi,()(hi) - v(ip ,O{hi) - 
v(ip2,()(h2) + v(6oX)(h2)) 2 , at fixed (. Since as assumed, sup^-^ ^ 63 p^((ip, h), 
(ipo,h)) — ► 0, we have that / is continuous at almost all sample paths of Zq 
uniformly over Q G E. The result now follows by Slutsky's theorem and the 
continuous mapping theorem. □ 
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